智能论文笔记

Full Body Video-Based Self-Avatars for Mixed Reality: from E2E System to User Study

Diego Gonzalez Morin , Ester Gonzalez-Sosa , Pablo Perez , Alvaro Villegas

分类：计算机视觉

2022-08-24

在这项工作中，我们通过混合现实（MR）应用中的视频传球来探讨自幻想的创建。我们介绍了我们的端到端系统，包括：在商业头部安装显示器（HMD）上进行自定义MR视频通行证实现，我们基于深度学习的实时egpocentric身体细分算法以及我们优化的卸载体系结构，以交流使用HMD分割服务器。为了验证这项技术，我们设计了一种身临其境的VR体验，用户必须在活跃的火山火山口中穿过狭窄的瓷砖路径。这项研究是在三个身体表示条件下进行的：虚拟手，带有颜色的全身分割的视频传递以及深度学习全身分割的视频通行。这种身临其境的经历由30名女性和28名男性进行。据我们所知，这是首次旨在评估基于视频的自我avatar的用户研究，以代表用户在MR场景中。结果表明，不同身体表示在存在方面没有显着差异，虚拟手和全身表示之间的某些实施方案中等改善。视觉质量结果表明，就整个身体感知和整体分割质量而言，深入学习算法的结果更好。我们提供了一些关于使用基于视频的自我幻想的讨论，以及对评估方法的一些思考。提出的E2E解决方案处于最新技术状态的边界，因此在达到成熟之前仍有改进的空间。但是，该溶液是新型MR分布式溶液的关键起点。

translated by 谷歌翻译

Real Time Egocentric Segmentation for Video-self Avatar in Mixed Reality

Ester Gonzalez-Sosa , Andrija Gajic , Diego Gonzalez-Morin , Guillermo Robledo , Pablo Perez , Alvaro Villegas

分类：计算机视觉

2022-07-04

在这项工作中，我们介绍了我们的实时自我分割算法。由于我们在Thundernet的架构中灵感的浅网络，我们的算法对于640x480的输入分辨率达到了66 fps的帧速率。此外，我们非常重视培训数据的可变性。更具体地说，我们描述了我们的自我中心物体（Egobodies）数据集的创建过程，该数据集由来自三个数据集的近10,000张图像组成，这些图像既来自综合方法和真实捕获。我们进行实验以了解各个数据集的贡献；比较用自行车训练的Thundernet模型，并以更简单，更复杂的先前方法进行比较，并在分段质量和推理时间上以现实生活设置进行了相应的性能。所描述的经过训练的语义分割算法已经集成到混合现实的端到端系统中，使用户有可能在沉浸在MR场景中时看到自己的身体。

translated by 谷歌翻译

Album cover art image generation with Generative Adversarial Networks

Felipe Perez Stoppa , Ester Vidaña-Vila , Joan Navarro

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-09

Generative Adversarial Networks (GANs) were introduced by Goodfellow in 2014, and since then have become popular for constructing generative artificial intelligence models. However, the drawbacks of such networks are numerous, like their longer training times, their sensitivity to hyperparameter tuning, several types of loss and optimization functions and other difficulties like mode collapse. Current applications of GANs include generating photo-realistic human faces, animals and objects. However, I wanted to explore the artistic ability of GANs in more detail, by using existing models and learning from them. This dissertation covers the basics of neural networks and works its way up to the particular aspects of GANs, together with experimentation and modification of existing available models, from least complex to most. The intention is to see if state of the art GANs (specifically StyleGAN2) can generate album art covers and if it is possible to tailor them by genre. This was attempted by first familiarizing myself with 3 existing GANs architectures, including the state of the art StyleGAN2. The StyleGAN2 code was used to train a model with a dataset containing 80K album cover images, then used to style images by picking curated images and mixing their styles.

translated by 谷歌翻译

Active learning using adaptable task-based prioritisation

Shaheer U. Saeed , João Ramalhinho , Mark Pinnock , Ziyi Shen , Yunguan Fu , Nina Montaña-Brown , Ester Bonmati , Dean C. Barratt , Stephen P. Pereira , Brian Davidson

分类：计算机视觉

2022-12-03

Supervised machine learning-based medical image computing applications necessitate expert label curation, while unlabelled image data might be relatively abundant. Active learning methods aim to prioritise a subset of available image data for expert annotation, for label-efficient model training. We develop a controller neural network that measures priority of images in a sequence of batches, as in batch-mode active learning, for multi-class segmentation tasks. The controller is optimised by rewarding positive task-specific performance gain, within a Markov decision process (MDP) environment that also optimises the task predictor. In this work, the task predictor is a segmentation network. A meta-reinforcement learning algorithm is proposed with multiple MDPs, such that the pre-trained controller can be adapted to a new MDP that contains data from different institutes and/or requires segmentation of different organs or structures within the abdomen. We present experimental results using multiple CT datasets from more than one thousand patients, with segmentation tasks of nine different abdominal organs, to demonstrate the efficacy of the learnt prioritisation controller function and its cross-institute and cross-organ adaptability. We show that the proposed adaptable prioritisation metric yields converging segmentation accuracy for the novel class of kidney, unseen in training, using between approximately 40\% to 60\% of labels otherwise required with other heuristic or random prioritisation metrics. For clinical datasets of limited size, the proposed adaptable prioritisation offers a performance improvement of 22.6\% and 10.2\% in Dice score, for tasks of kidney and liver vessel segmentation, respectively, compared to random prioritisation and alternative active sampling strategies.

translated by 谷歌翻译

Subgroup Discovery in Unstructured Data

Ali Arab , Dev Arora , Jialin Lu , Martin Ester

分类：机器学习

2022-07-15

亚组发现是一种描述性和探索性数据挖掘技术，可识别人群中有关感兴趣变量表现出有趣行为的亚组。亚组发现在知识发现和假设生成中有许多应用程序，但对于非结构化的高维数据（例如图像）仍然不适用。这是因为子组发现算法依赖于基于（属性，值）对定义描述性规则，但是，在非结构化数据中，属性并不是很好的定义。即使在数据中存在属性的概念（例如图像中的像素），由于数据的高维度，这些属性也不足够丰富，无法在规则中使用。在本文中，我们介绍了亚组感知的变异自动编码器，这是一种新型的变分自动编码器，它学习了非结构化数据的表示，从而导致具有较高质量的亚组。我们的实验结果证明了该方法在以高质量学习亚组的同时支持概念的解释性的有效性。

translated by 谷歌翻译

Western Mediterranean wetlands bird species classification: evaluating small-footprint deep learning approaches on a new annotated dataset

Juan Gómez-Gómez , Ester Vidaña-Vila , Xavier Sevillano

分类：人工智能

2022-07-12

由生物声监测设备组成的无线声传感器网络运行的专家系统的部署，从声音中识别鸟类物种将使许多生态价值任务自动化，包括对鸟类种群组成的分析或濒危物种的检测在环境感兴趣的地区。由于人工智能的最新进展，可以将这些设备具有准确的音频分类功能，其中深度学习技术出色。但是，使生物声音设备负担得起的一个关键问题是使用小脚印深神经网络，这些神经网络可以嵌入资源和电池约束硬件平台中。因此，这项工作提供了两个重型和大脚印深神经网络（VGG16和RESNET50）和轻量级替代方案MobilenetV2之间的批判性比较分析。我们的实验结果表明，MobileNetV2的平均F1得分低于RESNET50（0.789 vs. 0.834）的5 \％，其性能优于VGG16，其足迹大小近40倍。此外，为了比较模型，我们创建并公开了西部地中海湿地鸟类数据集，其中包括201.6分钟和5,795个音频摘录，摘录了20种特有鸟类的aiguamolls de l'empord \ e empord \`一个自然公园。

translated by 谷歌翻译

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

Sheng Zhou , Hongjia Xu , Zhuonan Zheng , Jiawei Chen , Zhao li , Jiajun Bu , Jia Wu , Xin Wang , Wenwu Zhu , Martin Ester

分类：机器学习 | 人工智能

2022-06-15

聚类是一项基本的机器学习任务，在文献中已广泛研究。经典聚类方法遵循以下假设：数据通过各种表示的学习技术表示为矢量化形式的特征。随着数据变得越来越复杂和复杂，浅（传统）聚类方法无法再处理高维数据类型。随着深度学习的巨大成功，尤其是深度无监督的学习，在过去的十年中，已经提出了许多具有深层建筑的代表性学习技术。最近，已经提出了深层聚类的概念，即共同优化表示的学习和聚类，因此引起了社区的日益关注。深度学习在聚类中的巨大成功，最基本的机器学习任务之一以及该方向的最新进展的巨大成功所激发。 - 艺术方法。我们总结了深度聚类的基本组成部分，并通过设计深度表示学习和聚类之间的交互方式对现有方法进行了分类。此外，该调查还提供了流行的基准数据集，评估指标和开源实现，以清楚地说明各种实验设置。最后但并非最不重要的一点是，我们讨论了深度聚类的实际应用，并提出了应有的挑战性主题，应将进一步的研究作为未来的方向。

translated by 谷歌翻译

M3E2: Multi-gate Mixture-of-experts for Multi-treatment Effect Estimation

Raquel Aoki , Yizhou Chen , Martin Ester

分类：机器学习 | (统计)机器学习

2021-12-14

这项工作提出了M3E2，一种多任务学习神经网络模型来估计多种治疗的效果。与现有方法相比，M3E2对于同时应用于同一单元，连续和二元处理以及许多协变量的多种治疗效果是鲁棒的。我们将M3E2与三个基准数据集中的三个基线进行比较：两个具有多种治疗和一个待遇。我们的分析表明，我们的方法具有卓越的性能，制作了对真实治疗效果的更大的自信估计。代码可在github.com/raquelaoki/m3e2上获得。

translated by 谷歌翻译

An Interactive Visualization Tool for Understanding Active Learning

Zihan Wang , Jialin Lu , Oliver Snow , Martin Ester

分类：机器学习

2021-11-09

尽管最近的人工智能和机器学习进展，但许多最先进的方法缺乏解释性和透明度。解释机器学习模型的预测能力和准确评估这些模型的能力是至关重要的。在本文中，我们提出了一种互动可视化工具来阐明主动学习的培训过程。该工具使一个人能够选择有趣的数据点的样本，查看他们的预测值如何在不同的查询阶段改变，从而更好地了解活动学习工作的时间和程度。此外，用户可以利用此工具同时比较不同的主动学习策略，并检查为什么某些策略在某些情况下表达他人。通过一些初步实验，我们证明了我们的可视化面板在各种主动学习实验中使用了很大的潜力，并帮助用户适当地评估其模型。

translated by 谷歌翻译